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Abstract Recent years have witnessed the popularity of using rank mini- 
O ' mization as a regularizer for various signal processing and machine learning 

^ , problems. As rank minimization problems are often converted to nuclear norm 

minimization (NNM) problems, they have to be solved iteratively and each it- 
\^ , eration requires computing a singular value decomposition (SVD). Therefore, 

' their solution suffers from the high computation cost of multiple SVDs. To 

relieve this issue, we propose using the block Lanczos method to compute the 
partial SVDs, where the principal singular subspaces obtained in the previous 
iteration are used to start the block Lanczos procedure. To avoid the expensive 
reorthogonalization in the Lanczos procedure, the block Lanczos procedure is 
C/3 , performed for only a few steps. Our block Lanczos with warm start (BLWS) 

technique can be adopted by different algorithms that solve NNM problems. 
We present numerical results on applying BLWS to Robust PC A and Matrix 
04 ' Completion problems. Experimental results show that our BLWS technique 

, usually accelerates its host algorithm by at least two to three times. 

Keywords Lanczos Method • Singular Value Decomposition • Eigenvalue 
I Decomposition • Rank Minimization • Nuclear Norm Minimization 

1 Introduction 

In recent years, there is a surge of applying rank minimization as a regularizer 
to various machine learning and signal processing problems [22..5. 23.25..27II171 
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[T? l lT5 1 l5 1 g l lTB l [Ti l l ^ [T | [Tn| . In the mathematical models of these problems, the 
rank of some matrix is often required to be minimized. Typical models are 
Robust PC A (RPCA) [22]: 

(RPCA) minrank(A) + A||i;||,o, s.t. D = A + E, (1) 

and Matrix Completion (MC) JS]: 

(MC) minrankM), s.t. D = TTn(A), (2) 

A 

where ||-E||io is the number of nonzeros in E, f2 is the set of indices of known 
entries in A and 7rj7 is the restriction onto fi. There are variations of RPCA 
PSIIS] and MC g], and there is also a combination of RPCA and MC [3]. 

Due to the effectiveness of rank minimization, many researchers have pro- 
posed various algorithms to solve rank minimization problems pTl[^ [T51[T^ [71 
milSnilS]- As rank minimization problems are usually NP hard, most of them 
aim at solving companion convex programs instead, by replacing the rank 
function with the nuclear norm || • ||,, i.e., the sum of the singular values, and 
the Iq norm with the li norm, i.e., the sum of the absolute values of the en- 
tries. This is suggested by the fact that the nuclear norm and li norm are the 
convex envelopes of the rank function |19) and the /q norm, respectively. Some 
researchers have proven that for RPCA and MC problems solving the compan- 
ion convex program can produce the same solution to the original problem at 
an overwhelming probability |19l Bll5]. As a result, solving a rank minimization 
problem is often converted into solving a nuclear norm minimization (NNM) 
problem, in order to exploit the efficiency of convex programs. 

Whichever of the existing methods that solve the NNM problems is used, 
one always has to solve the following subproblem: 

A,+i =argmine,P||, + i||A- Willi, (3) 

A ^ 

where Si and Wi change along iteration and || -Wf is the Frobenius norm. Cai 
et al. [2] proved that the solution to ([3]) can be obtained by singular value 
thresholding: 

A,+i - Te, {w,) = u,e,, {s.,W, (4) 

where Oe{x) = sgn(x) max(|a;| — e,0) is a shrinkage operator and UiSiV^ 
is the singular value decomposition (SVD) of Wi. Therefore, it is easy to 
see that NNM problems are usually computationally costly as they require 
solving SVDs multiple times and an SVD typically requires 0(p^) operations, 
where p = min(m, n) and m x n is the size of the matrix. Fortunately, it is 
apparent that all the singular values/vectors need not be computed because 
the singular values smaller than the threshold £i will be shrunk to zeros hence 
their associated singular vectors will not contribute to Ai+i. This leads to a 
common practice in solving NNM problems, namely using PROPACK [TT] to 
compute the partial SVD of Wi , where only those leading singular values that 
are greater than Si, and their associated singular vectors, are computed. This 
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significantly brings down the computation complexity from 0(j)^) to 0{rp^), 
where r is the number of leading singular values/vectors computed. 

Although computing the partial SVD instead already saves the computa- 
tion significantly, the 0{rp^) complexity is still too high for large scale prob- 
lems. Therefore, any further savings in the computation are valuable when 
the problem scale becomes large. In this paper, we aim at exploiting the rela- 
tionship between successive iterations to further bring down the computation 
cost. Our technique is called the block Lanczos with warm start (BLWS), which 
uses the block Lanczos method to solve the partial SVD and the block Lanc- 
zos procedure is initialized by the principal singular subspaces of the previous 
iteration. The number of steps in the block Lanczos procedure is also kept 
small. Our BLWS technique can work in different algorithms for NNM prob- 
lems. Our numerical tests show that BLWS can speed up its host algorithm 
by at least two to three times. 

To proceed, we first introduce how the partial SVD is computed in PRO PACK. 



2 The Lanczos Method for the Partial SVD 

PRO PACK uses the Lanczos method to compute the partial SVD. As the 
method is based on the Lanczos method for partial eigenvalue decomposition 
(EVD), we have to start with the partial EVD computation. 

The Lanczos method for partial EVD is to find the optimal leading eigen- 
subspace of a symmetric matrix in a Krylov subspace [5] : 



K{W, qi,k) ^ span{(ji, Wqi 



(5) 



The orthonormal basis Qk of K{W,qi,k) can be efficiently computed via a 
so-called Lanczos procedure shown in Algorithm [1] Accordingly, W can be 
approximated as ~ QkTkQ]^ , where is a tri-diagonal matrix: 



Tk 



/«i I3i 
Pi a.2 

Vo 



\ 



Pk-1 ak ) 



(6) 



The Lanczos procedure is actually derived by comparing the left and right 
hand sides of WQk ~ QkTk (cf. Section 

After the Lanczos procedure is iterated for fc — 1 times, the EVD of Tk is 
computed: Tk = VkAkVj^ . Then W w iQkVk)Ak{QkVk)'^ . Suppose the eigen- 
values in Ak is ordered from large to small. Then the r largest eigenvalues of 
W can be approximated by the first r eigenvalues in Ak (called the Ritz values 
of W) and the leading r eigenvectors of W can be approximated by the first r 
columns of QkVk (called the Ritz vectors of W). 



Actually it can also find the tailing eigen-subspace of W. 
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Algorithm 1 The Lanczos Procedure 

Input: m X m symmetric matrix W , k. 

1. Initialization: ro = gi ; /3o = 1; 90 = 0; i = 0. 

2. 

for ( = 1 : A; — 1 do 

<?i+l = n/A; 1 = 1 + 1; ai = qfWqi; 
ri = Wqi - aiQi - I3i_iqi_i; 
Pi = Ikilh; 
end for 

Output: Qi^ = (gi, ■ ■ ■ , q^.) and Tj. as JBJl. 



When coniputing the partial SVD of a given matrix W, a critical rela- 
tionship between the SVD of W and the EVD of the following augmented 
matrix 

is used. It is depicted by the following theorem [5]. 

Theorem 1 // the SVD of an mxn (m < n) matrix W is W = U SV'^ , then 
the EVD of W is 

/U 0\ 

W^ul0-S0\u^, (8) 
\ 0/ 

where 

t/ = -^(^i ^^^Q^^) and (f/i,t/,) = C/. (9) 

So by computing the EVD of W, the SVD of W can be obtained. 

When computing the SVD of W, the Lanczos method is actually implicitly 
applied to W with the initial vector qi being chosen as 

qi = iu{,0^f, (10) 

in order to exploit the special structure of W. Accordingly, W can be approx- 
imated as « UkBkVjf , where columns of Uk and Vk are orthonormal and 
Bk is bi-diagonal. Then the approximate singular values/vectors of W can be 
obtained after computing the SVD of Bk ■ For more details, please refer to [TT] . 

The Lanczos method has some important properties [5]. First, the Ritz 
values of W converge to the largest eigen/singular values of W quickly when k 
grows, so do the Ritz vectors. Second, as it only requires solving the EVD/SVD 
of a relatively small and banded matrix Tk/Bk, the partial EVD/SVD is usu- 
ally faster than the full EVD/SVD when the required number r of eigen/singular 
vectors is relatively small (e.g., when r < 0.25p). Third, the Lanczos proce- 
dure terminates when an invariant subspace is found. Fourth, the orthogonal- 
ity among the columns of Qk is easily lost when the Lanczos procedure goes 
on. Hence, reorthogonalization is usually necessary when k is relatively large. 
Unfortunately, reorthogonalization is expensive. So PROPACK monitors the 
orthogonality among the columns of Qk and only reorthogonalizes part of the 
columns whose orthogonalities with other columns deteriorate. 
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3 Ideas to Improve 

We notice that if we solve the partial SVD in each iteration independently, the 
Lanczos procedure has to start from a random initial vector qi as no apriori 
information is available. Random initialization makes the size k of Bk un- 
predictable. If qi is not good, k can be relatively large in order for the Ritz 
values/ vectors to achieve a prescribed precision, making the partial SVD in- 
efficient. Actually, during the iterations of optimization, as the matrices Wi 
and Wi-i are close to each other, any of the leading Ritz vectors of Wj_i 
should be good for initializing the Lanczos procedure of Wi. However, a risk 
of this strategy is that the Lanczos procedure may terminate quickly by out- 
putting a small invariant subspace containing the previous Ritz vector because 
the previous Ritz vector is close to be a singular vector of Wi. So the Lanc- 
zos procedure for Wi may fail and has to restart with another initial vectoJl. 
Moreover, initializing with a vector qi neglects the fact that we are actually 
seeking optimal singular subspaces, not a number of individual singular vec- 
tors. A vector definitely cannot record all the information from the previous 
principal singular subspaces (left and right). So, ideally we should use the 
whole previous principal singular subspaces. This motivates us to adopt the 
block Lanczos method for partial SVD, where the block Lanczos procedure 
starts with the previous principal singular subspaces. 



4 Block Lanczos with Warm Start 

Again, we start with the block Lanczos with warm start (BLWS) for partial 
EVD. The block Lanczos method is a natural generalization of the standard 
Lanczos method by replacing the Krylov space K{W,qi, k) with 



KiW,Xi,k) =span{Xi,l^Xi,-- - .W^'^Xi} 



fll) 



where Xi is an orthonornial basis of an initial subspace. Accordingly, the 
Lanczos procedure is generalized to the block Lanczos procedure, which is to 
find an approximation of W: W « QkTkQ^, where is a block tri-diagonal 
matrix 8 : 

/AhBf ... \ 



Si Ah 



(12) 



V ... Bk-i Mk J 

Qk = (Xi, ■ ■ ■ ,Xk), and columns of Qk are orthonormal. By comparing the 
left and right hand sides of WQk ~ QkTk, we have 



WXi = Xi.iB^ 



/-I 



XiMi + Xi+iBi 



; = 1, . . . ,fc - 1, 



(13) 



^ Although in reality the Lanczos procedure seldom terminates due to numerical error, 
our numerical tests show that such choice of initial q\ does not help speeding up. 
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where Bq is defined to be 0. From the orthogonality among the columns of 
Qk, we have that 

Mi = X^WXi, l = l,---,k. (14) 

Moreover, if we define Ri = WXi - XiMi - Xi^iBf_^, then Xi+iBi is the QR 
decomposition of Ri . This leads to the block Lanczos procedure in Algorithm[2] 



Algorithm 2 Block Lanczos Procedure 

Input: m X m symmetric matrix W , m X r orthogonal matrix Xi, k. 

1. Initialization: Afi = X^WXi; Bq = 0. 

2. 

for ( = 1 : fc — 1 do 

Rl = WXi - XiMi - Xi^^Bf^^; 
(Jf;_|_i, _B;) = qr{Ri); (Tiie QR decomposition) 
Mi+i = Xl^{WXi+r, 
end for 

Output: Qfc = (Xi, ■ • • ,Xk) and Tfe as in |[T2l l. 



After the approximation W ~ QkTkQ^ is obtained, one may further com- 
pute the EVD of Tk'-Tk = UkAkU"^ , where the eigenvalues Xi are ordered from 
large to small. Then the leading r eigenvalues and eigenvectors of W is ap- 
proximated by Ai, • • • , Xr, and QkUki-, 1 : r), respectively. The whole process 
is summarized in Algorithm [31 



Algorithm 3 Block Lanczos for Partial EVD 

Input: m X m symmetric matrix W, m X r orthogonal matrix Xi, k. 

1. Compute Qfc and by Algorithm (2] 

2. Compute the EVD of T^: Tf^ = VkAf^V^F , where the eigenvalues on the diagonal of 
are in a decreasing order. 

Output: U = Qfcl4(:,l ■.r),E = 71^(1 : r, 1 : r). 



If we denote the block Lanczos for partial EVD (Algorithm[3l) as BL_EVD{W, Xi,k), 
then our BLWS can be written as: 

(BLWS) ((7,, S^) = BL.EVD{W„ C/,_i, h), 

namely the principal eigen-subspace Ui-i of the previous iteration is used to 
initialize the block Lanczos procedure. 

When using the block Lanczos method to compute the partial SVD of a 
matrix W , similarly the block Lanczos procedure is applied to W shown in 
d?]). Note that W is of special structure. So the block Lanczos procedure can 
be done efficiently by skipping the zero sub-matrices of W . The details are 
trivial. So we omit them. 

With BLWS, compared with the standard Lanczos method, the risk of 
terminating with a small invariant subspace is gone, and the principal eigen- 
subspace can be updated more efficiently. As a result, the whole optimization 
process can be sped up a lot. 
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4.1 More Tricks for Acceleration 

Recall that in the standard Lanczos procedure, the orthogonality among the 
columns of Qfc is easily lost when k grows. So is the block Lanczos procedure. 
As reorthogonalization is expensive, we further require that the number k of 
performing the block Lanczos procedure is small, such that reorthogonaliza- 
tion can be waived. In our experiments, we typically set k = 2, namely the 
block Lanczos procedure is performed only once. Although such a fixed and 
small value of k cannot result in high precision principal singular subspaces 
when the block Lanczos procedure is randomly initialized, it does produce high 
precision principal singular subspaces when the block Lanczos procedure is ini- 
tialized with the previous principal singular subspaces. This is because Wi is 
close to Wi-i. So the previous principal singular subspaces is already close to 
the principal singular subspaces of Wi . Then the block Lanczos procedure im- 
proves them and produce better estimated principal singular subspaces. Note 
that keeping k small has multiple advantages. First, it waives the necessity of 
expensive reorthogonalization. Second, it saves the computation in perform- 
ing the block Lanczos procedure. Third, the SVD of Bk also becomes cheap 
because the size of Bk is small. 

In the standard block Lanczos method for partial SVD, the initial subspace 
is chosen as Xi = {Uji^.^Y or Xi = {.'^,V^_i)'^ (cf. (HU])), where and 
Vi-i are the estimated left and right principal singular subspaces obtained 
in the previous iteration, respectively. However, such an initialization only 
utilizes half of the information provided by the previous iteration. So our 
BLWS technique uses Xi = ■^{U'[_i,Vj'ii)'^ as the initial subspace. In this 
way, the precision of obtained principal singular subspaces is higher when the 
block Lanczos procedure is performed for the same number of steps. 

4.2 Handling Variant Dimensions of Principal Singular Subspaces 

The above exposition assumes that the dimension r of the principal singular 
subspaces is known and fixed along iteration. In reality, r is unknown and has 
to be dynamically predicted before the partial SVD is computed [T^I21[ [^[T5 ] . 
Hence r actually varies along iteration. In this case, BLWS simply outputs Ritz 
values/vectors according to the predicted r in the current iteration and the 
block Lanczos procedure is still initialized with the principal singular subspaces 
output by last iteration. We have observed that for many NNM problems, 
the predicted r quickly stabilizes. So variant dimensions of principal singular 
subspaces at the early iterations do not affect the effectiveness of BLWS. 

5 Experimental Results 

Our BLWS technique is a general acceleration method. Given an algorithm to 
solve a NNM problem, a user only has to replace the SVD computation in the 
algorithm with BLWS and may obtain noticeable speedup. 
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Table 1 BLWS-ADM vs. ADM on different synthetic data. A and E are the computed low 
rank and sparse matrices and A is the ground truth. 



m 


method 


\\A-A\\f. 

IIaiif 


rank{A) 


ll^llio 


#iter 


time{s) 


500 


ADM 


5.27e-006 


50 


25009 


30 


4.07 


500 


BLWS-ADM 


9.64e-006 


50 


25008 


30 


2.07 


1000 


ADM 


3.99e-006 


100 


100021 


29 


23.09 


1000 


BLWS-ADM 


6.05e-006 


100 


100015 


30 


9.25 


2000 


ADM 


2.80e-006 


200 


400064 


28 


154.80 


2000 


BLWS-ADM 


4.30e-006 


200 


400008 


30 


53.37 


3000 


ADM 


2.52e-006 


300 


900075 


28 


477.13 


3000 


BLWS-ADM 


3.90e-006 


300 


900006 


30 


149.19 



As examples, in this section we apply our BLWS technique to two popular 
problems: Robust PCA (RPCA) 22 and Matrix Completion (MC) problems 
[5] . For each problem, we compare the original chosen algorithm and its BLWS 
improved counterpart in the aspect of computation time. The accuracies of 
obtained solutions are also shown in order to ensure that the correct solutions 
are approached. All experiments are run on the same workstation with two 
quad-core 2.53GHz Intel Xeon E5540 CPUs, running Windows Server 2008 
and Matlab (Version 7.7). 

For the RPCA problem, we generate the synthetic data in the same way 
as that in [12 . Namely, A is generated according to the independent random 
orthogonal model [22 , £^ is a sparse matrix whose support is independent and 
the entry values are uniformly distributed in [—500, 500], and D — A + E. For 
simplicity, we only focus on m x m square matrices and the parameter A is 
fixed at 1/ y/m, as suggested by Wright et al. [22 . The value of m is chosen in 
{500, 1000, 2000, 3000}. The rank of A is chosen as 10%m, and the number of 
corrupted entries (i.e., ||i?||io) is 10%m^. We choose the ADM method [MlIT^ 
to solve the PRCA problem. 

The data for the MC problem is generated in the same way as that in [5]. 
Namely, an m x m matrix A with rank r is generated by first sampling two 
m X r factor matrices Ml and Mn independently, each having i.i.d. Gaussian 
entries, and then multiplying them: A — M^M]^. Finally, the set of observed 
entries is sampled uniformly at random. We choose the SVT algorithm |^ to 
solve the MC problem. 

Table 1 shows detailed comparison between ADM and BLWS accelerated 
ADM for solving the RPCA problem. We can see that BLWS-ADM roughly 
costs less than 1/3 time of ADM, achieving the same accuracy, and the total 
number of iterations does not change or only increases slightly. Similar phe- 
nomenon can also be observed in Table 2, which lists the comparison results 
for solving the MC problem. 
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Table 2 BLWS-SVT vs. SVT on different syntfietic data. A is tlie recovered low rank matrix 
and A is the ground truth, m is the size of matrix and s is the number of sampled entries. 
dr = r(2m — r) is the number of degrees of freedom in an m X m matrix of rank r. 



m 


r 


s/dr 


s/m 


algorithm 


time(s) 


#iter 


\\A-A\\p 
IIAIIp 


5000 


10 


6 


0.024 


SVT 


72.57 


123 


1.73e-004 


5000 


10 


6 


0.024 


BLWS-SVT 


20.02 


123 


1.74e-004 


5000 


50 


5 


0.1 


SVT 


438.81 


107 


1.63e-004 


5000 


50 


5 


0.1 


BLWS-SVT 


279.08 


108 


1.55e-004 


5000 


100 


4 


0.158 


SVT 


1783.26 


122 


1.73e-004 


5000 


100 


4 


0.158 


BLWS-SVT 


1175.91 


122 


1.74e-004 


10000 


10 


6 


0.012 


SVT 


135.90 


123 


1.68O-004 


10000 


10 


6 


0.012 


BLWS-SVT 


42.80 


123 


1.70C-004 


10000 


50 


5 


0.050 


SVT 


1156.25 


110 


1.58e-004 


10000 


50 


5 


0.050 


BLWS-SVT 


736.01 


110 


1.60e-004 


20000 


10 


6 


0.006 


SVT 


251.13 


123 


1.74e-004 


20000 


10 


6 


0.006 


BLWS-SVT 


101.47 


124 


1.68e-004 


30000 


10 


6 


0.004 


SVT 


449.34 


124 


1.75C-004 


30000 


10 


6 


0.004 


BLWS-SVT 


171.40 


125 


1.69e-004 



6 Discussions 

Although we have presented numerical results to testify to the effectiveness 
of BLWS, currently we have not rigorously proved the correctness of BLWS. 
We guess that BLWS can work well for most NNM problems. This is due to 
Theorem 9.2.2 of 0, which implies that when there is sufficient gap between 
the r-th and the (r + l)-th eigenvalues, the errors in the Ritz values can be well 
controlled. As NNM problems typically involve singular value thresholding Q, 
such a gap should exist. However, a rigorous proof is still under exploration. 

7 Conclusions 

In this paper, we introduce the block Lanczos with warm start technique to 
accelerate the partial SVD computation in NNM problems. Both the block 
Lanczos procedure and the initialization with the previous principal singular 
subspaces can take full advantage of the information from last iteration. Our 
BLWS technique can work in different algorithms that solve rank minimization 
problems. The experimental results indicate that our BLWS technique usually 
makes its host algorithm at least two to three times faster. 
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